Of : Treelets — an Adaptive Multi - Scale Basis for Sparse Unordered Data
نویسندگان
چکیده
We congratulate Lee, Nadler and Wasserman (henceforth LNW) on a very interesting paper on new methodology and supporting theory. Treelets seem to tackle two important problems of modern data analysis at once. For datasets with many variables, treelets give powerful predictions even if variables are highly correlated and redundant. Maybe more importantly, interpretation of the results is intuitive. Useful insights about relevant groups of variables can be gained. Our comments and questions include: (i) Could the success of treelets be replicated by a combination of hierarchical clustering and PCA? (ii) When choosing a suitable basis, treelets seem to be largely an unsupervised method. Could the results be even more interpretable and powerful if treelets would take into account some supervised response variable? (iii) Interpretability of the result hinges on the sparsity of the final basis. Do we expect that the selected groups of variables will always be sufficiently small to be amenable for inter-pretation? 1. Treelets or hierarchical clustering combined with PCA. A main part of the treelet algorithm achieves two main objectives: (1) Variables are ordered in a hierarchical scheme. Highly correlated variables are typically " close " in the hierarchy. (2) A basis on the tree is chosen. Each node of the tree is associated with a " sum " (and also a " difference " variable). Clearly, treelets are more elegant than any method trying to achieve these two goals separately. As LNW write in Section 1: " The novelty and contribution of our approach is the simultaneous construction of a data-driven multi-scale orthogonal basis and a hierarchical cluster tree. " We are left wondering, though, how different treelets are to the following scheme. First, variables are ordered in a hierarchical clustering scheme—for concreteness,
منابع مشابه
Treelets—an Adaptive Multi-scale Basis for Sparse Unordered Data by Ann
In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered—with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this pape...
متن کاملar X iv : 0 70 7 . 04 81 v 2 [ st at . M E ] 3 1 A ug 2 00 7 Treelets — An Adaptive Multi - Scale Basis for Sparse Unordered Data
In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered — with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this pa...
متن کاملar X iv : 0 70 7 . 04 81 v 1 [ st at . M E ] 3 J ul 2 00 7 Treelets — An Adaptive Multi - Scale Basis for Sparse Unordered Data
In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered — with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity; the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this pa...
متن کاملDiscussion Of: Treelets-an Adaptive Multi-scale Basis for Sparse Unordered Data.
We would like to congratulate Lee, Nadler and Wasserman on their contribution to clustering and data reduction methods for high p and low n situations. A composite of clustering and traditional principal components analysis, treelets is an innovative method for multi-resolution analysis of unordered data. It is an improvement over traditional PCA and an important contribution to clustering meth...
متن کاملDiscussion Of: Treelets—an Adaptive Multi-scale Basis for Sparse Unordered Data by Nicolai Meinshausen
We congratulate Lee, Nadler and Wasserman (henceforth LNW) on a very interesting paper on new methodology and supporting theory. Treelets seem to tackle two important problems of modern data analysis at once. For datasets with many variables, treelets give powerful predictions even if variables are highly correlated and redundant. Maybe more importantly, interpretation of the results is intuiti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007